Toward a generic representation of random variables for machine learning
نویسندگان
چکیده
This paper presents a pre-processing and a distance which improve the performance of machine learning algorithms working on independent and identically distributed stochastic processes. We introduce a novel non-parametric approach to represent random variables which splits apart dependency and distribution without losing any information. We also propound an associated metric leveraging this representation and its statistical estimate. Besides experiments on synthetic datasets, the benefits of our contribution is illustrated through the example of clustering financial time series, for instance prices from the credit default swaps market. Results are available on the website www.datagrapple.com and an IPython Notebook tutorial is available at www.datagrapple.com/Tech for reproducible research.
منابع مشابه
Image Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملMachine learning algorithms for time series in financial markets
This research is related to the usefulness of different machine learning methods in forecasting time series on financial markets. The main issue in this field is that economic managers and scientific society are still longing for more accurate forecasting algorithms. Fulfilling this request leads to an increase in forecasting quality and, therefore, more profitability and efficiency. In this pa...
متن کاملThe 22 nd International Conference on Machine Learning
Support Vector Machines (SVMs) have been one of the major breakthroughs in machine learning, both in terms of their practical success as well as their learning-theoretic properties. This talk presents a generic extension of SVM classification to the case of structured classification, i.e. the task of predicting output variables with some meaningful internal structure. As we will show, this appr...
متن کاملRepresentation of Gender Roles in Child and Young Characters in Game of Thrones Series
he purpose of this study is to demonstrate how we, especially children and adolescents, are influenced by the media; this issue had investigated by analyzing the representation of gender roles in the Game of Thrones Series. There were based on three social learning, socialization and cultivation theories. This research had done a quantitative content analysis. The variables included 20 gender a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 70 شماره
صفحات -
تاریخ انتشار 2016